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Abstract 



Numerous entropy-type characteristics (functionals) generalizing Renyi entropy are widely 
I used in mathematical statistics, physics, information theory, and signal processing for charac- 

terizing uncertainty in probability distributions and distribution identification problems. We 
consider estimators of some entropy (integral) functionals for discrete and continuous distribu- 
f-H ' tions based on the number of epsilon-close vector records in the corresponding independent and 

I identically distributed samples from two distributions. The estimators form a triangular scheme 

(-H ■ of generalized JJ-statistics. We show the asymptotic properties of these estimators (e.g., consis- 

tency and asymptotic normality). The results can be applied in various problems in computer 
science and mathematical statistics (e.g., approximate matching for random databases, record 
linkage, image matching). 
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. normality 

^ ■ 1 Introduction 

O ■ 

Let X and Y be d-dimensional random vectors vi^ith discrete or continuous distributions Vx and 
Vy, respectively. In information theory and statistics, there are various generalizations of Shannon 
^ I entropy (see Shannon, 1948), characterizing uncertainty in Vx and Vy, for example, the Renyi 

c5 ; entropy (Renyi, 1961, 1970), 

hs := Y^log(^j^^px{xydx^ , s/1, 

and the (differentiable) variability for approximate record matching in random databases 



f :=— log / px{x)pY{x)dx 

where px{x),py{x),x G W^, are densities of Vx and Vy, respectively (see Seleznjev and Thalheim, 
2003, 2008). Henceforth we use logx to denote the natural logarithm of x. More generally, for 
non-negative integers ri,r2 > and r := (ri,r2), we consider Renyi entropy functionals 

pxixY^PY^xy^dx, 
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and for the discrete case, Vx = {px{k), k G iV^} and Vy = {PY{k),k G AT'^}, 

k 

i.e., Qr = qri,r2- Then, for example, the Renyi entropy hg = hsfi = log(grs,o)/(l ~ s) and the 
variabiUty v = hi^i = — log(q'i_i). Let Xi, . . . , and Yl, . . . , be mutuahy independent samples 
of independent and identically distributed (i.i.d.) observations from Vx and Vy, respectively. We 
consider the problem of estimating the entropy-type functionals Qr and related characteristics for 
Vx and Vy from the samples Xi,. . . , and Yl, . . . , ■ 

Various entropy applications in statistics (e.g., classification and distribution identification prob- 
lems) and in computer science and bioinformatics (e.g., average case analysis for random databases, 
approximate pattern and image matching) are investigated in, e.g., Kapur (1989), Kapur and Ke- 
savan (1992), Leonenko et al. (2008), Szpankowski (2001), Seleznjev and Thalheim (2003, 2008), 
Thalheim (2000), Baryshnikov et al. (2009), and Leonenko and Seleznjev (2010). Some average 
case analysis problems for random databases with entropy characteristics are investigated also in 
Demetrovics et al. (1995, 1998a, 1998b). 

In our paper, we generalize the results and approach proposed in Leonenko and Seleznjev 
(2010), where the quadratic Renyi entropy estimation is studied for one sample. We consider 
properties (consistency and asymptotic normality) of kernel-type estimators based on the number 
of coincident (or e-close) observations in d-dimensional samples for more general class of entropy- 
type functionals. These results can be used, e.g., in evaluating of asymptotical confidence intervals 
for the corresponding Renyi entropy functionals. 

Note that our estimators of entropy-type functionals are different form those considered by 
Kozachenko and Leonenko (1987), Tsybakov and van der Meulen (1996), Leonenko et al. (2008), 
and Baryshnikov et al. (2009) (see Leonenko and Seleznjev, 2010, for a discussion). 

First we introduce some notation. Throughout the paper, let X and Y be independent random 
vectors in R!^ with distributions Vx and Vy , respectively. For the discrete case, Vx = {px ik),k G 
N'''} and Vy = {py{k),k G A'"'^}. In the continuous case, let the distributions be with densities 
Px{x) and py{x),x G R!^, respectively. Let d{x,y) = \x — y\ denote the Euclidean distance in i?*^ 
and Bf:(x) := {y : d{x, y) < e} an e-ball in R"^ with center at x, radius e, and volume be{d) = e'^bi{d), 
bi{d) = 27r'^/2/((ir(d/2)). Denote by px,e{^) ^-ball probability 

px,e{x) ■■= P{X G B,{x)}. 

We write I{C) for the indicator of an event C, and \D\ for the cardinality of a finite set D. 

Next we define the following estimators of qr when ri and r2 are non-negative integers. Let the 
i.i.d. samples Xi,... , Xn^ and Yi, . . . , be from Vx and Vy, respectively. Denote n := (ni, 712), 
n := ni + n2, and say that n — t- 00 if ni, 77-2 —s- 00 and let := ni/n — t- p, < p < 1, as n — )■ 00. 

For an integer k, denote by Sm,k the set of all fe-subsets of {1, ... , m}. For S G Sni,m T G >Sn2,r2) 
and i & S, define 

ij^J^\s-T) := I{d{Xi,Xj) < e,d{Xi,Yk) < e,Vi G 5,VA; G T), 



2 



i.e., the indicator of the event that all elements in {Xj,j € S} and {Yk,k £ T} are e-close to Xi. 
Note that by conditioning we have 

say, the e-coincidence probability. Let a generalized [/-statistic for the functional q^^e be defined as 

(ni,ri) (n2,r2) 

where the symmetrized kernel 

n ^ 

and by definition, Qn is an unbiased estimator of gr.e = EQn- Define for discrete and continuous 
distributions 

Cl,0 := Y^r{px{Xr-'pY{XY') = '72n-l,2r2 - q^r„r„ 

Co,i := Var(px(l^rpy(n''~') = 92n,2r2-i-'/n,r2, 
K := p-V?Ci,o + (l-p)"ViCo,i- 

D P 

Let — )• and — t- denote convergence in distribution and in probability, respectively. 

The paper is organized as follows. In Section [21 we consider estimation of Renyi entropy 
functional for discrete and continuous distributions. In Section [3l we discuss some applications of 
the obtained estimators in average case analysis for random databases (e.g., for join optimization 
with approximate matching), in pattern and image matching problems, and for some distribution 
identification problems. Several numerical experiments demonstrate the rate of convergence in the 
obtained asymptotic results. Section [J] contains the proofs of the statements from the previous 
sections. 

2 Main Results 

2.1 Discrete Distributions 

In the discrete case, set e = 0, i.e., exact coincidences are considered. Then Qn is an unbiased 
estimator of the e-coincidence probability 

qr,o = qr = E/(Xi =X^ = Yj,i = 2,...,n,j = l,...,r2)= Epx{XY'-W{XY' . 

Let Qn,r '■= Qn,r,o and 

Kry ■■= Vi(Qn,2ri-l,2r2 " Qn,r) + (1 " Pn)" ^2 (Qn,2ri,2r2-1 " Qn,r)> 

and fen := max(i^n, an estimator of k. Denote by := log(max((5rn — r), an 

estimator of hr := log(gr)/(l — r). 
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Remark. Instead of 1/n in the definition of a truncated estimator, a sequence an > 0, a„ — )• as 
as n — )• oo, can be used (cf. Leonenko and Seleznjev, 2010). 

The next asymptotic normality theorem for the estimator Qn follows straiglitforwardly from 
the general [/-statistic theory (see, e.g., Lee, 1990, Koroljuk and Borovskich, 1994) and the Slutsky 
theorem. 

Theorem 1 //Ci,OiCo,i > 0, then 

V^iQn - Qr) ^ N{0, k) and V^{Qn - qv)/k]!^ 4 iV(0, 1); 
V^(l - r)-%r{Hn - hr) ^ N{0, 1) as n ^ oo. 

h ' 

2.2 Continuous Distributions 

In the continuous case, denote by Qn '■= Qn/be{dy~^ an estimator of qr- Let q^r,e := EQn = 
Qr,e/be{dY~^ and v'^ := Var((5n)- 

Henceforth, assume that e = e(n) — )• as n — t- oo. For a sequence of random variables Un, n > 1, 
we say that Un = Op(l) as n — )• oo if for any e > and n large enough there exists A > such 
that P(|[Z„| > A) < e, i.e., the family of distributions of Un,n > 1, is tight, and for a numerical 
sequence Wnjn > 1, say, Un = Op(?i;„) as n — )• oo if Un/wn = Op(l) as n — )• oo. The following 
theorem describes the consistency and asymptotic normality properties of the estimator Qn. 

Theorem 2 Let px{x) and py{x) he hounded and continuous or with a finite numher of disconti- 
nuity points. 

(i) = 0(n^^e'^''"'^/^'~^^) and EQn as n — )■ oo, and hence if ne'^^^~^^^^ — t- oo as n — t- oo, 

then Qn is a consistent estimator of Qj.. 

(a) If ne'^ — )• oo as n —)• oo and Ci,OiCo,i > 0, then 

V^iQn — Qr,e) —>■ -^(0, k) as n — )• oo. 

In order to evaluate the functional Qr, we denote by H'^'^^C) ,0<a<2,C>0, a linear 
space of bounded and continuous in R'^ functions satisfying a-Holder condition if < a < 1 or if 
1 < a < 2 with continuous partial derivatives satisfying (a — 1)-H61der condition with constant C. 
Furthermore, let 

and define '■= max(i^n; 1/^^)- It follows from Theorem [2] and Slutsky's theorem that k^ is a 
consistent estimator of the asymptotic variance k. Denote by '■= log(max((2n; l/^))/(l an 
estimator of hr ■= log(Qr)/(l — r). Let L{n) be a slowly varying function. We obtain the following 
asymptotic result. 
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Theorem 3 Let px{x),py{x) € H'^'^\C). 

(i) Then the Mas |gr,e — 9r| < C'ie°,Ci > 0. 

(ii) If0<a<d/2 and e ~ cn~°/(2a+d(i-i/r)) ^ g < c < oo, then 

Q^-qr = Op(n-"/(2a+'='(i-i/'^))) and H^-h^ = Op(n-"/(2"+'^(i-Vr-))) as n ^ oo. 
(in) If a > d/2 and e ~ L(n)n^^/'^ and ne'^ — )• oo, then 

V^iQn - Qr) ^ N{0, k) and V^(Q„ - qr)/kl/^ 4 iV(0, 1); 
V^il - r)^{H^ - K) 4 iV(0, 1) as n ^ oo. 

3 Applications and Numerical Experiments 

3.1 Approximate Matching in Stochastic Databases 

Let tables (in a relational database) Ti and T2 be matrices with mi and m2 i.i.d. random tuples 
(or records), respectively. One of basic database operations, join, combines two tables into a third 
one by matching values for given columns (attributes). For example, the join condition can be 
the equality (equi-join) between a given pairs of attributes (e.g., names) from the tables. Joins 
are especially important for tieing together pieces of disparate information scattered throughout 
a database (see, e.g., Kiefer et al. 2005, Copas and Hilton, 1990, and references therein). For 
the approximate join, we match e-close tuples, say, d{ti{j),t2{i)) < e,tk{j) G Tk,k = 1,2, with a 
specified distance, see, e.g., Seleznjev and Thalheim (2008). A set of attributes A in a table T is 
called an e-key (test) if there are no e-close sub-tuples tA{j),j = 1, . . . , m. Knowledge about the set 
of tests (e-keys) is very helpful for avoiding redundancy in identification and searching problems, 
characterizing the complexity of a database design for further optimization, see, e.g., Thalheim 
(2000). By joining a table with itself (self-join) we identify also e-keys and key-properties for a set 
of attributes for a random table (Seleznjev and Thalheim, 2003, Leonenko and Seleznjev, 2010). 

The cost of join operations is usually proportional to the size of the intermediate results and so 
the joining order is a primary target of join-optimizers for multiple (large) tables, Thalheim (2000). 
The average case approach based on stochastic database modelling for optimization problems is 
proposed in Seleznjev and Thalheim (2008), where for random databases, the distribution of the 
e-join size A^^^ is studied. In particular, with some conditions it is shown that the average size 

EiVe = mim2gi,i,e = mim2e%i{d){e~^'^'^ + o(l)) as e — > 0, 

that is the asymptotically optimal (in average) pairs of tables are amongst the tables with maximal 
value of the functional hi^i (variability) and the corresponding estimators of hi^i can be used for 
samples Xi, . . . , and Yi, . . . , Yn^ from Ti and T2, respectively. For discrete distributions, similar 
results from Theorem [1] for e = can be applied. 

3.2 Image Matching using Quadratic-entropy Measures 

Image retrieval and registration fall in the general area of pattern matching problems, where the 
best match to a reference or query image /q is to be found in a database of secondary images 
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■ The best match is expressed as a partial re- indexing of the database in decreasing order of 
similarity to the reference image using a similarity measure. In the context of image registration, 
the database corresponds to an infinite set of transformed versions of a secondary image, e.g., 
rotation and translation, which are compared to the reference image to register the secondary one 
to the reference. 

Let X be a d-dimensional random vector and let p{x) and q{x) denote two possible densities for 
X. In the sequel, X is a feature vector constructed from the query image and a secondary image 
in an image database and p{x) and q{x) are densities, e.g., for the query image features and the 
secondary image features, respectively, say, image densities. When the features are discrete valued 
the p{x) and q{x) are probability mass functions. 

The basis for entropy methods of image matching is a measure of similarity between image 
densities. A general entropy similarity measure is the Renyi a-divergence, also called the Renyi 
a-relative entropy, between p(x) and q{x) 

Da{p,q) = ^log / q{x) (^) dx = ^ log / p"ix)q''~^{x)dx, a / 1. 

When the density p{x) is supported on a compact domain and q{x) is uniform over this domain, 
the Renyi a-divergence reduces to the Renyi a-entropy 



Kip) = rr— — log / P°'{x)dx. 
I - a Jnd 



Another important example of statistical distance between distributions is given by the following 
nonsymmetric Bregman distance (see, e.g., Pardo, 2006) 



1 s 

q{xy + -p{xy -p{x)q{xy-^ 

s — I s — 1 



dx, s ^ 1, 



Bs{p,q) 

or its symmetrized version 

1 1 /" 

Ks{p^q) = -[Bs{p,q) + Bs{q,p)] = / \p{x) - q{x)]\p{xy-^ - q{xY-^]dx. 

S S-l Jjid 

For s = 2, we get the second order distance 



B2{p,q) = K2{p,q) = / [p{x) - q{x)] dx. 

Now, for an integer s, applying Theorem [1] and [3] one can obtain an asymptotically normal estimator 
of the Renyi s-entropy and a consistent estimator of the Bregman distance. 

3.3 Entropy Maximizing Distributions 

For a positive definite and symmetric matrix S, s 7^ 1, define the constants 

m = d + 2/{s-l), C^ = (m + 2)S, 

and 

^ _ 1 r(m/2 + l) 



|7rC,|V2r((m-d)/2 + l)' 
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Among all densities with mean and covariance matrix E, the Renyi entropy hs, s = 2, . . . , is 
uniquely maximized by the density (Costa et al. 2003) 

_ r ^(1 - (x - fifC-^x - X G f), , . 

with support 

Qs = {x e R"^ : {x - iJLfC-^{x -n)< 1}. 

The distribution given by pl{x) belongs to the class of Student-r distributions. Let /C be a class 
of d-dimensional density functions p{x), x G R"^, with positive definite covariance matrix. By the 
procedure described in Leonenko and Seleznjev (2010), the proposed estimator of hg can be used 
for distribution identification problems, i.e., to test the null hypothesis Hq : Xi, . . . , X„ is a sample 
from a Student-r distribution of type ([T]) against the alternative Hi : Xi, . . . ,X„ is a sample from 
any other member of /C. 

3.4 Numerical Experiments 

Example 1. Figure [T] shows the accuracy of the estimator for the cubic Renyi entropy ^13^0 of 
discrete distributions in Theorem [H for a sample from a d-dimensional Bernoulli distribution and 
n observations, d = 3, n = 300, with Bernoulli i?e(p)-i.i.d. components, p = 0.8. Here the 
coincidence probability 53^0 = (p^ + (1 — p)^)^ and the Renyi entropy h^^ = —log{qsfi)/2. The 
histogram for the normalized residuals ri*^ := 2y/nQn{H^ — hy.)/k\l'^ , i = 1, . . . , Ngim are compared 
to the standard normal density, Ngim = 500. The corresponding qq-plot and p-values for the 
Kolmogorov-Smirnov (0.4948) and Shapiro- Wilk (0.7292) tests also support normality hypothesis 
for the obtained residuals. 




Figure 1: Bernoulli d-dimensional distribution; d = 3, Be{p)-\.\A. components, p = 0.8, sample 
size n = 200. Standard normal approximation for the empirical distribution (histogram) for the 
normalized residuals, Ngim = 500. 
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Example 2. Figure [2] illustrates the performance of the approximation for the differentiable 
variability v = hi^i in Theorem [3l for two one-dimensional samples from normal distributions 
A^(0, 3/2) and A^(2, 1/2), with the sample sizes ni = 100, n2 = 200, respectively. Here the variability 
V = log(2-y/7re). The normalized residuals are compared to the standard normal density, Nsim = 300. 
The qq-plot and p- values for the Kolmogorov-Smirnov (0.9916) and Shapiro- Wilk (0.5183) tests also 
support the normal approximation. 



Histogram of res Normal Q — Q Plot 




—4 —2 O 2 4 —3 —2 — 1 O 1 2 3 



Figure 2: Two Gaussian distributions; Af(0,3/2), Af(2, 1/2), ni = 100, n2 = 200, e = 1/10. Stan- 
dard normal approximation for the empirical distribution (histogram) for the normalized residuals, 
Nsim = 300. 



Example 3. Figure [3] shows the accuracy of the normal approximation for the cubic Renyi en- 
tropy /i3^o in Theorem [3l for a sample from a bivariate Gaussian distribution with A^(0, l)-i.i.d. 
components, and n = 300 observations. Here the Renyi entropy h^^ = log(-v/T27r). The histogram, 
qq-plot, and p-values for the Kolmogorov-Smirnov (0.2107) and Shapiro- Wilk (0.2868) tests allow 
to accept the hypothesis of standard normality for the residuals, Ngim = 300. 

Example 4. Figure H] demonstrates the behaviour of the estimator for the quadratic Bregman 
distance B2{p,q) for two exponential distributions p{x) = /3ie~^^^,x > 0, and q{x) = f32e~^^^,x > 
0, with rate parameters /3i = l,/32 = 3, respectively, and equal sample sizes. Here the Bregman 
distance B2{p,q) = 1/2. The empirical mean squared error (MSE) based on 10000 independent 
simulations are calculated for different values of n. 

4 Proofs 

Lemma 1 Assume that px{x) and py{x) are hounded and continuous or with a finite number of 
discontinuity points. Let a, 6 > 0. Then 

h,{dr^-^'MpxAXrPYA.Xt) ^ I Px{xr^^PY{xfdx ase^Q. 
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Hlistogram of res 



Normal Q — Q Plot 




Figure 3: Bivariate normal distribution with A^(0, l)-i.i.d. components; sample size n = 300, e = 
1/2. Standard normal approximation for the empirical distribution (histogram) for the normalized 
residuals, Nsim = 300. 

Proof: We have 

where ge{x) := {px,e{x) /beid))"" {pY,eix) /be{d))'' . It follows by definition that ge{x) px{x)°'Py{x)'^ 
as e — 7- 0, for all continuity points of pxix) and py{x), and that the random variable gtiX) is 
bounded. Hence, the bounded convergence theorem implies 

Eig,{X)) ^ EipxiXTpviX)') = as e ^ 0. 

□ 

Proof of Theorem\^ (i) Note that for k = 1, . . . , r, 

^fcgd(fc-i) ^ > (^gd(i-i/r.))fc > j^e'i(i-iA). (2) 

We use the conventional results from the theory of [/-statistics (see, e.g., Lee, 1990, Koroljuk and 
Borovskich, 1994). For / = 0, . . . , ri, and m = 0, . . . , r2, define 

1pl,m,n{xi, . . . ,Xl]yi,. . . ,ym) ■= Elpn{xi, . . . ,Xl,Xi+i, . . . , Xr^] 

yil ■ • • 1 ym ; ^^+1 ; • • • ) ^2 ) 

1 

= — ^EV'n^(xi, . . .,Xl,Xi+i, . . .,Xrj^;yi, . . . , y^, i^m+l, • • -^Yr^), (3) 

i=l 

and 

o'lm,e ■= Var(V'«,m,n(-'^i, ■ ■ ■ , Xi;Yi, . . . ,Ym)). 

Let 5i, 5*2 G ^n^^n and Ti, T2 G <Sn2,r2 have / and m elements in common, respectively. By properties 
of [/-statistics, we have 

vl = Var(g„) = 6,(<i)-^(-i) ^^^^^^^^^^ (4) 

Z=0 m=0 V?'!'' ^^•2'' 
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Figure 4: Bregman distance for Exp{f3i) and Exp{l32), /3i = 1, /32 = 3. The empirical MSE obtained 
for the {/-statistic estimator with ne = a, for different values of a. 

and 

CTlm,e = Cov(V'„(Si;ri),Vn(S2;r2)). (5) 

From ([5]) we get that < crf^^ < 'E{Tpn{Si;Ti)'ipYi{S2;T2)), which is a finite linear combination of 
P{Ai n Aj),i G Si,j e S2, where 

:= {d{Xi,Xk) < e,d{X^,Ys) < e,VA: e Si,Vs E Ti}. 

When / 7^ or m 7^ 0, the triangle inequality implies that 

Ai n C Fi := {d{Xi,Xk) < 3e, d{Xi,Ys) < 3e, V/c € U 52, Vs € Ti U r2}, 

and since l^i U S'2| = 2ri — / and |Ti U r2| = 2r2 — m, it follows by conditioning and from Lemma 
1 that 

P{AinA^)<P{F,) = E{px,se{Xif''-'-W,UX^?'''n 

~ 3''^^'~'-"'~'h,{df^-'-"'-\2r,-l,2r,-m aS n ^ OO. 

We conclude that 

al^^^ = 0(6,(d)2'^-'"'"-i) as n ^ cx). (6) 
Now, for / = 0, . . . , ri and m = 0, . . . , r2, we obtain 

(ri\ (T2\ (ni-Ti\ (n2-r2\ l /'j\-(2r-/-m-l) 2 

Of (a) ,„ , — = crf^^'^Cim n nr. , ' as n — > oo, (7) 



10 



for some constant Q,™ > 0. Hence, from ([2]), (g]), dSD, and ([7]) we get that vl = 0((ne'^(^'^/''))~^) 

as n — )• oo. Moreover, it follows from Lemma 1 that EQn — ?• (/r as n — )• oo, so when 

then 

E(Q„ - qrf = vl + (E(Q„ - qr)f ^ 0, 

and the assertion follows. 



(a) Let 

:= i^i,oA^) Mdy-' - qr,e, h'^''\x) := Vo,i,n(x)/6e(^i)^-' - qr,e. (8) 
The H-decomposition of Qn is given by 

Qn = gr,. + rii7i''°) + r2i?^°''^ + Rn, (9) 

where 

ni -, "2 

The terms in ([9]) are uncorrelated, and since Var(/in''^^ (-'^i)) = &e(c^)~^*-''^^^<7f g e and Var(/in '""^^(Yi)) = 
6e(c?)~^^''~^^o-o i^e; obtain from (g]) that 

Var(i?n) = Var(Qn) - Var(ri/7^^'°^) - Var(r2-ffi°'^^) 

2(r-l)^2„-l 2 ^ _ , / ,x-2{r-l)^2 -1 2 _ 



Var(Q„) - h{d)-'^^-'hfn^'al,^^ - h{d)-'^^''hin^'al,^^ 



^ ^0,l,e 



where £' := {(/, m) :0</<ri,0<m<r2,Z + m>2}, and 

i#fr-V' "--^^''-^°'( ©fer-V- 

Note that i^i,n) -f'^2,n = ©(n^-*^) as n — oo so if ne^ — a, < a < oo, then (l6|), d?]), and (fTO|) imply 
that Var(i?n) = 0((n^e"')~"'^) as n — t- oo. In particular, for a = oo, 

Var(i?n) = o(n~^) ^ n^/^iJn 4 as n ^ oo. (11) 
By symmetry, we have from ([3|) 

V'lAnC^) = ^ (px,.(:E)'^^-V,e(x)^^ + (h - l)E(V^i') (x, X2 , . . . , X,, ; ^ , . . . ,y.2))) . (12) 

Let X be a continuity point of pxi^^^ and py(x). Then, changing variables y = x + eu and the 
bounded convergence theorem give 



E(V'i'^ ix,X2,...,Xr,;Yi,..., Yr,) = E(E(V'i>'^ (x, X2, . . . , X,,; Fi, . . . , r.Jl^a)) 

\r1-2 



I{d{x,y) <e)px.e{yY^ PY,e{yy^px{y)dy 
e'^ I l{d{0, u) < l)pxe{x + euY^-^pYeix + euy'pxix + eu)du (13) 

JR'i 

b^{dY~^px{xY^^^PYixY'^ as n — )• 00. 
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From we get that 



and hence 

hm = pxixY^-Wixr - Qr, (14) 

n— >oo 

and similarly, 

lim h^n'^\x) = px{xY'Py{xY^~^ - Qr- (15) 

Let maix{px{x),pY{x)) < C,x € R^. Then m.&yi{px,e{x),PY,e{x)) < b^{d)C,x G W^. It follows 
from ([121) and (US} that ^^1,0,11(2;) < b,{dY-^C''~'^,x G R'^, and hence /iL^'°^(x) < 2C"^-i, x G i?'^. 
Similarly, we have that h^n'^\x) < 2C7^-\ x G i?'^. Therefore, h'n''^\Xi) and /ik°'^^(li) are bounded 
random variables. Hence, from (|14p . (|15|) . and the bounded convergence theorem we obtain 

Var(/iL''°^(Xi)) ^ Ci,o, Var(/iL°''^(n)) ^ Co,i as n ^ 00. 

Let Zn,i := n-^ ^''^/in '^^(^j), i = 1, . . . , rii, and observe that, for 6 > 0, 



Y^EZli = Var(/ik^'°^(Xi)) ^ Ci,o > as n ^ 00, 

i=l 



m 

lim Ve(|Z„,,|2/(|Z„,,| > 5)) = lim Ei\h^^'^\x,)\^Ii\h^^^^\x,)\ > 5nJ/')) 

n— »-oo ^ — ' n— i>oo 

< lim 4C72(^-i)e(/(|/i^,^'°^(Xi)| > Jn^/^)) = 0, 

n— >oo 

where the last equality follows from the boundedness of hn'^\Xi). The Lindeberg- Feller Theorem 
(see, e.g.. Theorem 4.6, Durrett, 1991) gives that 

^n,i + . . . + ^n,ni = n.^^ H^^'^^ ^ N{0, Ci,o) as n ^ oo, 

and similarly nf /^^^^ ^ iV(0, Co, 

1) as n — 7- 00. Hence, by independence we get that 
nV2(.,F(^'0) + .,F(°'^)) 

Pa \^ P'^) 

so from (jlip and Slutsky's theorem, 

n^/'(Qn-9r,.) 

= + r2H^'^'>) + n^/^R^ ^ Af(0, as n ^ 00. 

This completes the proof. □ 
Proof of Theorem 0.' The proof is similar to that of the corresponding result in Leonenko and 

Seleznjev (2010) so we give the main steps only. First we evaluate the bias term ■= qr,e — Qr- 
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Let V := (Vi, . . . , Vd)' be an auxiliary random vector uniformly distributed in the unit ball i?i(0), 
say, V G U{Bi{0)). Then by definition, we have 



where 



5n= / PxAxyPYA^Y'Pxix)dx- pxixY'PYixY'dx = E{D,iX)), 



D,{x) := pxA^V-hvA^r -px{xr-^PY{xY^ 

= pxA^yHwA^r -py{xY') +Py{xYHpxA^Y'~' -px{xY'~')- 

It follows by definition that 

= Pi{x){pY,e{x) - Py{x)) + P2{x){px,e{x) - pxix)) 

= E{Pi{x){py{x - eV) - py{x)) + P2{x){px{x - eV) -px{x))) 

where Pi{x) and P2{x) are polynomials in px{x),pY{x),E{px{x — eV)), and E{py{x — eV)). Now 
the boundedness of px{x) and and the Holder condition for the continuous differentiable 

cases imply 

\D,{x)\ < CCie",C7i>0, 

and the assertion (i) follows. 

For e ~ cn~^/'^^"+'^'^^~-'^/'')\0 < c < oo, a < d/2, by (i) and Theorem [U we have 

Now for some C > and any ^ > and large enough ni , n2 , we obtain 

P (iQn - qr\ > An-"/(2"+'^(l-l/'-))) < n-2"/{2«+<i{l-l/r)):?Ij_^ < 



and the assertion (ii) follows. Similarly for a = d/2. 

Finally, for a > d/2 and e ~ L{n)n^^/'^ and ne'^ — )• oo, the assertion (iii) follows from Theorem 
[2] and the Slutsky theorem. This completes the proof. □ 
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